Chris Pollett > Old Classes >
CS156

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [Class Sign Up Sec1]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HWs and Quizzes:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Hw5]  [Quizzes]

Practice Exams:
  [Mid]  [Final]

                           












HW#1 --- last modified February 07 2019 23:07:54..

Solution set.

Due date: 1_DATE

Files to be submitted:
  Project.zip

Purpose: To learn about decision tree learning algorithms, to learn about neural nets.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

LO12 -- Students should be able to describe or implement at least one learning algorithm

Specification:

For this homework I want you to write a Python program which implements the decision tree learning algorithm from class and in the book. (CS185c and CS286 have the same HW this time). Your program will be run from the command line with a line like:

python dtlearner.py training_set.txt decisions_to_make.txt

Here the names of training_set.txt and decisions_to_make.txt might be changed by the user of the program. The file training_set.txt consists of a header section the last line of which looks like:

%%%

followed by lines each of which is made up of a sequence of alpha_numeric strings that are separated by tabs. The last such string in a line will always have have Yes or No. An example file might look like:

Training Set For Movie Star
Columns correspond to:
Plastic Surgery
Teeth Color
Manicure
Pedicure
The last column says Yes is a movie star or No is not
%%%
FaceLift	Yellow	No	No	No
None	White	Yes	Yes	Yes
NoseJob	White	No	Yes	Yes

A typical file would contain many more examples. A decisions_to_make.txt file has the same format as the training set except that it doesn't have the last column. Your program on such an input should run the decision tree algorithm it should compute the information gain of each variable to determine which is the most important amongst remaining variables and so should be used next. After training it should pretty print the decision tree that it got (your choice of format but not using % as a character), then it should print a line %%%, followed by a line Yes or No for each line in decisions_to_make.txt. The nth line output should correspond to Yes or No for the nth line in decisions_to_make.txt.

Point Breakdown

PEP 8 coding guidelines followed, code seems reasonably elegant. 1pt
Program reads text files supplied in the command line arguments 1pt
Program implements the decision tree algorithm from class and the book. 2pts
Information gain is used to determine importance in the DT algorithm. 2pts
Pretty Printed decision tree seems like a reasonable decision tree for the given inputs. 2pts
Program outputs values for rows in decisions_to_make.txt according to decision tree computed. 2pts
Total10pts